System and method for identifying faces in unconstrained media

ABSTRACT

Methods and systems for facial recognition are provided. The method includes determining a three-dimensional (3D) model of a face of an individual based on different images of the individual. The method also includes extracting two-dimensional (2D) patches from the 3D model. Further, the method includes generating a plurality of signatures of the face using different combinations of the 2D patches, wherein the plurality of signatures correspond to respective views of the 3D model from different angles.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.14/576,818, filed Dec. 19, 2014, which claims benefit of priorprovisional Application No. 61/918,205, filed Dec. 19, 2013, and priorprovisional Application No. 61/968,015, filed Mar. 20, 2014, the entiredisclosures of which are incorporated herein by reference.

FIELD

This disclosure relates to systems and methods recognizing images inmedia, and more specifically, to facial recognition.

BACKGROUND

Cameras have become common in mobile devices, surveillance sensors, andlaw enforcement vehicles. Due to their mobility, such cameras can recordimages of individuals in a variety of unconstrained conditions. That is,in contrast to a staged mug shot, faces of individuals recorded underunconstrained conditions can vary greatly due to changes in lighting(e.g., natural and artificial), attributes of the individual's face(e.g., age, facial hair, glasses), viewing angle (e.g., pitch and yaw),occlusions (e.g., signs, trees, etc.), and the like. For example, awrongdoer may perform an illegal act at a crowded event. Around a timeof the act, bystanders may capture images of the wrongdoer whilerecording the event using their mobile cameras. Additionally, securitycameras monitoring the event may capture images of the wrongdoer fromdifferent (e.g., elevated) perspectives. Coincidentally, the images ofthe wrongdoer may have been captured by a number of cameras havingdifferent perspectives and occlusions. The recordings may be accessed bylaw enforcement authorities from operators of the cameras, socialnetworking websites, and media outlets. However, attempting to identifythe wrongdoer from the various recordings can require sifting through anenormous amount of image data.

SUMMARY

The present disclosure provides a method including determining athree-dimensional (3D) model of a face of an individual based differentimages of the individual. The method also includes extractingtwo-dimensional (2D) patches from the 3D model. Further, the methodincludes generating a plurality of signatures of the face usingdifferent combinations of the 2D patches, wherein the plurality ofsignatures correspond to respective views of the 3D model from differentangles.

Additionally, the present disclosure provides a facial recognitionsystem, including a processor, a storage system, program instructionsstored on the computer-readable hardware storage device for execution bythe processor. The program instructions include program instructionsthat determine a three-dimensional (3D) model of a face of an individualbased on different images of the individual. The program instructionsalso include program instructions that extract two-dimensional (2D)patches from the 3D model. Further, the program instructions includeprogram instructions that generate a plurality of signatures of the faceusing different combinations of the 2D patches, wherein the plurality ofsignatures correspond to respective views of the 3D model from differentangles.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, illustrate the present teachings andtogether with the description, serve to explain the principles of thedisclosure.

FIG. 1 illustrates a block diagram of an exemplary environment forimplementing systems and processes in accordance with aspects of thepresent disclosure;

FIG. 2 illustrates a functional block diagram of an exemplary facialrecognition system in accordance with aspects of the present disclosure;

FIG. 3 illustrates a flow diagram of an exemplary process forrecognizing faces in accordance with aspects of the present disclosure;

FIG. 4 illustrates a flow diagram of an exemplary process fordetermining an attribute-based representation using a facial recognitionsystem in accordance with aspects of the present disclosure;

FIG. 5 illustrates a flow diagram of an exemplary process fordetermining attributes using a facial recognition system in accordancewith aspects of the present disclosure; and

FIG. 6 illustrates a flow diagram of an exemplary process fordetermining multi-view PEP signature using a facial recognition systemin accordance with aspects of the present disclosure.

It should be noted that some details of the figures have been simplifiedand are drawn to facilitate understanding of the present teachings,rather than to maintain strict structural accuracy, detail, and scale.

DETAILED DESCRIPTION

This disclosure relates to systems and methods for recognizing images inmedia, and more specifically, to facial recognition. In accordance withaspects of the present disclosure, the system and method can be used torecognize an individual in images based on an attribute-basedrepresentation of the individual's face. The attribute-basedrepresentation comprises multi-view probabilistic elastic parts(“multi-view PEP”) signatures determined using 2D patches extracted fromthe images and attributes that semantically characterize theindividual's face (e.g., gender, age, ethnicity, etc.). The multi-viewPEP signatures are determined using attribute-specific PEP models builtfrom 2D face patches extracted from a 3D model. A PEP model is a localspatial-appearance feature based Gaussian mixture model. The 3D model isconstructed from different poses of the face obtained from images of theindividual in photographs, videos, and/or sketches. Advantageously, theattribute-based representation accounts for geometric, structural andphotometric variability occurring in the individual's face due toviewpoint, illumination, aging, and expressions, while preservinginvariant features that can be used to uniquely discriminate theindividual's face from others.

In accordance with aspects of the invention, the attribute-basedrepresentation normalizes characterizations (e.g., aging, pose,illumination and expressions) of the face upon which it is based. Theattribute-based representation and the characteristics can beinterdependent, wherein parameters of the attribute-based representationstrongly influence the models used for normalization and vice-versa. Theattribute-based representation is, therefore, determined by iterativelyoptimizing it over sets of parameters corresponding tosub-representations.

Further, in accordance with aspects of the invention, the two componentsof attribute-based representation (multi-view PEP signatures andattributes) encode information at different levels of abstraction. The3D model, upon which the multi-view PEP signatures are based, isnormalized to overcome limitations of 2D image based PEP representationsby modeling extreme variations for which insufficient training examplesare available and accurate statistical models cannot be learned toaccount for variations. Furthermore, the domain knowledge used forconstructing each components of the attribute-based representation isindependently extracted from varied sources and enforced ascomplementary prior constraints in the attribute-based representation.

The attribute-based representation of the present disclosure providesmany advantages. Firstly, the PEP models used to create the multi-viewPEP signatures provide pose invariance. Secondly, because PEP modelsimplicitly identify “non-face” patches, the multi-view PEP signaturesaccount for face variations, such as occlusions and low-resolution data,that cannot be directly modeled. Thirdly, the multi-view PEP signaturescan assimilate infrared and/or heterogeneous data by using a model thatsupports non-visual media (e.g., near-infrared, composite sketches,etc.). Fourthly, the multi-view PEP signatures can be extended to allage groups using statistically learned regression functions for imagefeatures. Fifthly, the multi-view PEP signature provides resilience tochanges in illumination and expression. That is, variations due toillumination and expression are removed by face relighting andexpression neutralization when determining the multi-view PEPsignatures. In accordance with aspects of the present disclosure, 2Dimage patches extracted from the multi-view PEP are devoid of suchvariations because any patches having poor illumination (shadows orsaturation) and those that correspond to strong facial expressions areweighed down in multi-view PEP signatures.

As will be appreciated by one skilled in the art, the present inventionmay be embodied as a method, system, or computer program product.Accordingly, the present invention may take the form of an entirelyhardware embodiment, an entirely software embodiment (includingfirmware, resident software, micro-code, etc.) or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Furthermore,the present invention may take the form of a computer program product ona computer-usable storage medium having computer-usable program codeembodied in the medium.

Any suitable computer usable or computer readable medium may beutilized. The computer-usable or computer-readable medium may be, forexample but not limited to, an electronic, magnetic, optical,electromagnetic, infrared, or semiconductor system, apparatus, device,or propagation medium. More specific examples (a non-exhaustive list) ofthe computer-readable medium would include the following: an electricalconnection having one or more wires, a portable computer diskette, ahard disk, a random access memory (RAM), a read-only memory (ROM), anerasable programmable read-only memory (EPROM or Flash memory), anoptical fiber, a portable compact disc read-only memory (CD-ROM), anoptical storage device, a transmission media such as those supportingthe Internet or an intranet, or a magnetic storage device. Note that thecomputer-usable or computer-readable medium could even be paper oranother suitable medium upon which the program is printed, as theprogram can be electronically captured, via, for instance, opticalscanning of the paper or other medium, then compiled, interpreted, orotherwise processed in a suitable manner, if necessary, and then storedin a computer memory. In the context of this document, a computer-usableor computer-readable medium may be any medium that can contain, store,communicate, propagate, or transport the program for use by or inconnection with the instruction execution system, apparatus, or device.The computer-usable medium may include a propagated data signal with thecomputer-usable program code embodied therewith, either in baseband oras part of a carrier wave. The computer usable program code may betransmitted using any appropriate medium, including but not limited tothe Internet, wireline, optical fiber cable, RF, etc.

Computer program code for carrying out operations of the presentinvention may be written in an object oriented programming language suchas Java, Smalltalk, C++ or the like. However, the computer program codefor carrying out operations of the present invention may also be writtenin conventional procedural programming languages, such as the “C”programming language or similar programming languages. The program codemay execute entirely on the user's computer, partly on the user'scomputer, as a stand-alone software package, partly on the user'scomputer and partly on a remote computer or entirely on the remotecomputer or server. In the latter scenario, the remote computer may beconnected to the user's computer through a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

The present invention is described below with reference to flowchartillustrations and/or block diagrams of methods, apparatus (systems) andcomputer program products according to embodiments of the invention. Itwill be understood that each block of the flowchart illustrations and/orblock diagrams, and combinations of blocks in the flowchartillustrations and/or block diagrams, can be implemented by computerprogram instructions. These computer program instructions may beprovided to a processor of a general purpose computer, special purposecomputer, or other programmable data processing apparatus to produce amachine, such that the instructions, which execute via the processor ofthe computer or other programmable data processing apparatus, createmeans for implementing the functions/acts specified in the flowchartand/or block diagram block or blocks.

These computer program instructions may also be stored in acomputer-readable memory that can direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the instructions stored in the computer-readablememory produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer orother programmable data processing apparatus to cause a series ofoperational steps to be performed on the computer or other programmableapparatus to produce a computer implemented process such that theinstructions which execute on the computer or other programmableapparatus provide steps for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

FIG. 1 is an exemplary environment 100 for implementing methods andsystems in accordance with aspects of the present disclosure. Theenvironment 100 includes a facial recognition system 105 and an imagesource 110. In accordance with aspects of the present disclosure, thefacial recognition system 105 is a system that ingests (e.g., obtains)various media (e.g., still pictures, motion pictures, videos, drawings,etc.) including images of an individual and generates a model (e.g., aPEP model) of an individual's face for facial recognition. The systemextracts information from the model and uses the extracted informationto recognize the individual in other media. The image source 110 is adevice or system that captures and/or stores images data, such as video,photographs, pictures, etc. In embodiments, the image source 110 is amedia database. Additionally or alternatively, the image source 110 isone or more of image sensors (e.g., a camera).

In accordance with aspects of the present disclosure, the facialrecognition system 105 includes hardware and software that perform theprocesses and functions described herein. In particular, the facialrecognition system 105 includes a computing device 130, an input/output(I/O) device 133, a storage system 135, and a device selector 137. TheI/O device 133 can include any device that enables an individual tointeract with the computing device 130 (e.g., a user interface) and/orany device that enables the computing device 130 to communicate with oneor more other computing devices using any type of communications link.The I/O device 133 can be, for example, a handheld device, PDA,touchscreen display, handset, keyboard, etc.

The storage system 135 can comprise a computer-readable, non-volatilehardware storage device that stores information and programinstructions. For example, the storage system 135 can be one or moreflash drives and/or hard disk drives. In accordance with aspects of thepresent disclosure, the storage device 135 includes an image database136, a domain knowledge database 137, and a model database 138. Theimage database 136 can store images and media obtained from the imagesource 110. The domain knowledge database 137 includes a collection ofpredetermined models and anthropometric information that can applied forextracting semantic information from media (e.g., gender, ethnicity,age, face shape, skin type, facial features, etc.) and for modeling aface (e.g., shapes, features, proportions, musculature, and texturescorresponding to different genders, ethnicities, and ages). The modeldatabase 138 includes 3D face models of individuals, 2D patchesextracted form the 3D face models, and extracted attributes thatcomprise an attribute-based representation.

In embodiments, the computing device 130 includes one or more processors139, one or more memory devices 141 (e.g., RAM and ROM), one or more I/Ointerfaces 143, and one or more network interfaces 144. The memorydevice 141 can include a local memory (e.g., a random access memory anda cache memory) employed during execution of program instructions.Additionally, the computing device 130 includes at least onecommunication channel (e.g., a data bus) by which it communicates withthe I/O device 133, the storage system 135, and the device selector 137.The processor 139 executes computer program instructions (e.g., anoperating system and/or application programs), which can be stored inthe memory device 141 and/or storage system 135.

Moreover, in accordance with aspects of the present disclosure, theprocessor 139 can execute computer program instructions of an ingestionmodule 151, an analysis module 153, a modeling module 155, an extractionmodule 159, and a matching module 163 to perform one or more of theprocesses described herein. The ingestion module 151, the analysismodule 153, the modeling module 155, and the extraction module 159, andthe matching module 163 can be implemented as one or more sets ofprogram instructions in the memory device 141 and/or the storage system135 as separate or combined modules. Additionally, the ingestion module151, the analysis module 153, the modeling module 155, and theextraction module 159, and the matching module 163 can be implemented asseparate dedicated processors or a single or several processors toprovide the function of these modules.

In accordance with embodiments of the disclosure, the ingestion module151 causes the computing device 130 to obtain media from the imagesource 110 and improve images included in the media (e.g., improveresolution, blurring, and contrast). Additionally, the ingestion module151 causes the computing device to detect and track faces in the images(e.g., using face and eye detecting algorithms).

The analysis module 153 causes the computing device 130 extractattributes from the faces detected by ingestion module 151. Theattributes semantically describe characteristics of the faces. Inembodiments, the attributes are derived characteristics associated withindividuals' gender, age, ethnicity, hair color, facial shape, haircolor, etc. Advantageously, the attributes allow efficient indexing andretrieval from multi-view PEP signatures by providing a flexible,domain-adaptive vocabulary for describing an individual's appearance,thereby reducing search time and data storage requirements.

The modeling module 155 causes the computing device 130 to create ordetermine a 3D model of an individual's face. In accordance with aspectsof the present disclosure, the 3D model is a pose-aware probabilisticelastic part-based (PEP) model generated for all variations of a 3D pose(e.g., a quantized space of yaw and pitch) that compactly encodes shape,texture and dynamics of the face appearing in a wide range of mediamodalities and under varied viewing and lighting conditions.Additionally, in accordance with aspects of the present disclosure, themodeling module 155 can relight the 3D model, neutralize a facialexpression captured in the 3D model, modify the age of the individualrepresented by the 3D model, and account for facial decorations andocclusions associated with the 3D model. Further, the modeling modulecan use the domain knowledge (e.g., in domain knowledge database 137) tofill in information missing from the 3D model (e.g., skin texture andoccluded patches).

The extraction model 159 causes the computer device to generatemulti-view PEP face signatures using 2D patches, and semantic attributesthat characterize various demographic groups (e.g., ethnicity, gender,age-group, etc.). In accordance with aspects of the present disclosure,the extraction model determines the 2D patches from projections from the3D model from multiple poses. The poses can be within a number ofpredefined viewing-angle ranges having a pitch (e.g., −10 degrees to +10degrees) and a yaw (e.g., −10 degrees to +10 degrees) with respect tothe a direct view (e.g., a pitch of zero and a yaw of zero from thefrontal view) of the 3D model. The projections are combined to providethe multi-view PEP signatures from dense overlapping 2D face patchescorresponding to the poses. In other words, the amount of data includesin each of the multi-view PEP face signatures does not change with thequality and/or quantity of available media. Accordingly, the multi-viewPEP face signatures can be incrementally refined by incorporatinginformation from additional images without increasing the size of therepresentation.

Additionally, in accordance with aspects of the present disclosure, theextraction module 159 determines an uncertainty metric for each of themulti-view PEP face signatures. The uncertainty metric characterizes thequality of the 2D patches within each of the multi-view PEP facesignatures. The extraction module 159 determines the uncertainty metricis computed using “face-like” measures that can be derived from the 3Dmodel. For example, the metric can correspond to a percentage of thepatches corresponding to a particular multi-view PEP face signature thatincludes a non-face part.

Further, in accordance with aspects of the present disclosure, themulti-view PEP face signatures are adaptive to the resolution of theavailable images. In embodiments, the multi-view PEP face signatures areautomatically adjusted to the available resolution on a face image. Assuch, the greater the available resolution, the more detailed the facerepresentation will be; and the lower the resolution, the less detailedthe face representation will be.

Moreover, in accordance with aspects of the present disclosure, theextraction module 159 associates each of the multi-view PEP facesignatures with one or more of the attributes. In embodiments, theextraction module 159 appends one or more face-attributes (e.g.,ethnicity, age, gender, unique aspects of the face such as ovalness,roundness etc.) to respective multi-view PEP face signatures. Thus, theattribute-based representation of the present disclosure enablesefficient indexing and retrieval of faces using the associatedattributes.

The matching module 163 causes the computing device to determine whetherface image matches that of an individual based on the attribute-basedrepresentation of an individual's face determined by the modeling module155. In accordance with aspects of the present disclosure, the matchingis based on an uncertainty metric determined for each components of themulti-view probabilistic elastic parts (“multi-view PEP”) signature.Additionally, in accordance with aspects of the present disclosure, thematching module 163 uses domain adaptation to match the multi-view PEPface signatures across imaging modalities. In embodiments, themodalities include RGB spectrum, infrared, hyperspectral, and drawings(e.g., sketches and cartoons), among others.

In embodiments, the domain knowledge database 137 can include thefollowing information that can be referenced by the facial recognitionsystem 105: facial anthropometry, face super-resolution tools, attributespecific 3D shape model, attribute specific multi-view PEP, attributesextraction tools, feature selection priors, facial action unit codingsystem, and domain adaptation tools. Facial anthropometry is statistics(mean and standard deviation) of anthropometric measurements thatcharacterizes demographic facial information and identify invariantfacial features across structural changes due to aging and expressions.Anthropometric measurements estimated from a 3D face model can be usedwhen determining a matching score by the matching module 155, as well asfor determining attributes by the analysis module 153. The facesuper-resolution tools are component-based matching to exemplar imagesfor enhancing pixel level details of the face image. The facesuper-resolution tools provide improved facial features extraction forbuilding representations by the modeling module 155. The attributespecific 3D shape model is different subspaces modeling modes ofvariation of 3D face shapes based on ethnicity, gender and age. Theseprovide more informative priors for fitting a 3D shape compared togeneric 3D face shapes by the modeling module 155. Theattribute-specific multi-view PEP are Gaussian Mixture Model (GMM) ofpatches densely sampled from the images of individuals with a commonattribute (e.g., gender, ethnicity and age group). These providepersonalized statistical models used for matching by the matching module163. The attributes extraction tools are discriminative models (based ondeep learning and structured prediction) for detecting attributes fromface images by the analysis module 153. The attributes extraction toolsmodel uncertainty of these attributes, which allows for matching alongmeaningful aspects of the face. The feature selection priors are deeplearning based feature selection for achieving invariance differences infacial features due to, for example, aging, pose and illuminationchanges, and enhanced part-based representation and matching. Theseallow for faster feature extraction by the extraction module 159 fordetermining relevant and greatest discriminative features. The facialaction unit coding system is universally applicable, intermediaterepresentations of facial musculature dynamics for modeling facialdeformations due to expressions by the modeling module 155. The facialaction unit coding system provides explicit and accurate modeling offacial musculature. The domain adaptation tools are learned tools thatmodel domain shift across aging, pose and illumination changes.

It is noted that the computing device 130 can comprise any generalpurpose computing article of manufacture capable of executing computerprogram instructions installed thereon (e.g., a personal computer,server, etc.). However, the computing device 130 is only representativeof various possible equivalent-computing devices that can perform theprocesses described herein. To this extent, in embodiments, thefunctionality provided by the computing device 130 can be anycombination of general and/or specific purpose hardware and/or computerprogram instructions. In each embodiment, the program instructions andhardware can be created using standard programming and engineeringtechniques, respectively.

FIG. 2 illustrates a functional flow diagram of an exemplary process offacial recognition system 105 in accordance with aspects of the presentdisclosure. The facial recognition system 105 includes ingestion module151, analysis module 153, modeling module 155, extraction module 159,and matching module 163, which can be the same as those previouslydescribed. In accordance with aspects of the present disclosure, theingestion module 151 assesses media received from an image source (e.g.,image source 110). The media can include photographs, videos, and ordrawings (e.g., sketches) of an individual. In embodiments, assessingthe media includes determining information defining a scale, facecoverage (e.g., the portion of the face in an image based on a pose inan image), resolution, modality (e.g., media type), and/or quality ofthe media including the images. The scale of face characterizes theimage resolution and determines the level of details that will beextracted by the ingestion module 151. The received images and theassociated assessment information can be stored in a database (e.g.,image database 136) for subsequent reference and processing.

Additionally, in accordance with aspects of the present disclosure, theingestion module 151 improves images included in the received media. Inembodiments, improving the images includes reducing blurring, improvingcontrast, and increasing the image resolution. For example, the imagingmodule 151 can reduce blurring by estimating an optimal blur kernelbased on exemplar structures (eyes, mouth, face contour, etc.) fromlarge pose-variant face datasets. Blur kernel estimation involvesidentifying the closest exemplar to a blurred face image (e.g., in thedomain knowledge database 137) and performing regularization processthat takes in the gradients of the blurred face and the closestexemplar. Still further, the improving can include relighting the imagesby modeling illumination conditions using statistical learning andgeometry. Additionally, the ingestion module 151 can increase contrastof the images by performing histogram equalization. Further, theingestion module 151 can use face hallucination techniques to generatehigh-resolution imagery from low-resolution data.

In accordance with aspects of the present disclosure, the ingestionmodule 151 also detects and tracks faces included in the receivedimages. In embodiments the ingestion module 151 detects eyes and mouthof a face in an image using feature localization techniques, anddetermines a holistic head pose estimation. For example, the ingestionmodule 151 can employ an Online Discriminative Feature Selection (ODFS)approach that is based on online adaptation of object appearances usinga MILTrack-type algorithm and refines feature selection by maximizingthe margin between the average confidences of positive samples andnegative samples. The ODFS approach selects features that maximize theconfidences of target samples while suppressing the confidences ofbackground samples. It gives greater weight to the most correct positivesample and assigns a small classifier to the background samples duringclassifier update, thereby facilitating effective separation of theforeground target from cluttered background across changes in scale,pose, illumination and motion blur. Additionally, the ingestion module151 can detect and track faces using unsupervised face detectionadaptation methods that exploit modeling social-context within a videoto further improve the accuracy of face tracking.

In accordance with aspects of the invention, the ingestion module 151also performs facial feature localization and tracking. The featurelocalization can be used to estimate the pose of an individual's head inan image and, based on the pose, to determine fiducial pointscorresponding the locations of the eyes, mouth, and face (e.g.,neckline, chin, and hairline). In embodiments, the ingestion module 151uses a Supervised Decent Method (SDM). SDM comprises of a non-parametricshape model that does not require learning any model of shape orappearance from training data. During the training stage, SDM useslandmarks in the training images and extracts features at the landmarklocations. SDM learns from training data a sequence of generic descentdirections and bias terms that minimizes the mean of all NormalizedLeast Squares function. Advantageously, SDM-based facial featurelocalization and tracking is computationally very simple (4 matrixmultiplications per frame) compared to other such methods, andfacilitates tracking facial landmarks with large pose variations (e.g.,±60° yaw, ±90° roll, and ±30° pitch), occlusions, and drasticillumination changes.

In accordance with aspects of the present disclosure, the analysismodule 153 determines attributes from faces in the images that aredetected and tracked by the ingestion module 151 based on domainknowledge (e.g. domain knowledge database 137). The attributes providean intermediate representation space for assessing similarity betweenfaces by the matching module 163. In embodiments, whereas low-levelfeatures are strongly affected by perturbations due to photometric andgeometric changes in a scene, the space of describable facial attributesprovide a more generalizable metric for establishing correspondencesbetween faces. The attributes can be determined by referencing thefiducial points in an individual's face detected by the ingestion module151 to features included in predetermined library of domain knowledge(e.g., domain knowledge 137). The fiducial points account for variationsin the individual's face that may occur due, for example, posing andaging of the face. In embodiments, feature localization is used for 3Dhead pose estimation and facial attributes inference. Discriminativemodels are used for probabilistic inference of attributes from the faceimages in the media. For example, learned models for detecting bothcoarse (e.g., gender, ethnicity and age) and fine (e.g., hair style andcolor, eyebrow shape, eye color and mustache) facial attributes. Theanalysis module 153 can store the attributes of the face can be storedin the database (e.g., image database 136) for subsequent reference andprocessing.

In accordance with aspects of the present disclosure, the modelingmodule 155 determines a 3D model from fiducial points and attributesdetermined by the analysis module 153. In embodiments, the 3D modelencodes shape, texture and dynamics of the face appearing in a widerange of media modalities and under varied viewing and lightingconditions. The 3D model is composed of pose-aware probabilistic elasticpart-based (PEP) model generated for all variations of 3D pose(quantized space of yaw and pitch) and specialized according to thedemographic attributes (gender, ethnicity and age-group) extracted fromthe face.

In embodiments, predefined parameters map 2D images to 3D face shapes. A3D model is first fitted with a generic 3D mesh and then iterativelyrefined, based on the demographic attributes (gender and ethnicity), tofit an attribute specific model. The mapping can be, for example, alook-up table including 3D shapes, rendered 2D images and correspondingcamera parameters. For example, given an image at an arbitrary pose(e.g., within a range +/−70 degrees yaw, and +/−25 degrees pitch), themodeling module 155 can roughly estimate the head pose from the 2Dfiducial points. The modeling module 155 can identify a 3D shape of theface to select the generic 3D model with similar fiducial featuresconfigurations is used to select an initial estimate for the 3D model(e.g., from domain knowledge database 137). Using the selected 3D model,the modeling module 155 can then use fitting algorithms (e.g., gradientdescent) to refine the facial alignment and shape of the 3D face model.

Additionally, in accordance with aspects of the present disclosure, themodeling module 155 relights the 3D model. In embodiments, the modelingmodule 155 uses 3D face relighting algorithms to support realisticscenarios by extending training examples used for generating the linearsubspace, with sufficient illumination variation so that it spans theimages taken under uncontrolled illumination conditions. For example,the modeling module 155 can use an illumination database (e.g., CMU PIEdatabase) to capture the individual's appearance under many differentillumination conditions and poses.

Further, in accordance with aspects of the present disclosure, themodeling module 155 neutralizes an expression of the 3D model. Inembodiments, to neutralize expressions, the modeling module 155 uses anonlinear manifold based approach for modeling 3D facial deformations asa combination of several 1D manifolds (each representing a mode ofdeformation: smile, surprise, anger etc.). For example, where a neutralface is considered to be a central point in a high dimensional space,faces of the same individual with varying expressions can be assumed tobe points within the neighborhood of that space. To neutralizeexpressions, the modeling module 155 can use a low-dimensional spacethat captures the implicit structural relationships between theindividual points. These constitute non-linear manifolds. Thecoordinates on the non-linear manifold correspond to the magnitude offacial deformation along that mode, called a “level of activation”.Using nonlinear manifold learning based on a computational frameworkthat allows for structure inference from sparse data points (e.g., N-DTensor voting), the modeling module 155 can estimates local normal andtangent spaces of the manifold at each point. The estimated tangentvectors enable the modeling module 155 to directly navigate on thenon-linear manifold. For example, the modeling module 155 can use adatabase comprising of 3D facial scans of subjects under differentfacial expressions (e.g., the Bosphorus Dataset) as the training data inbuilding the manifolds.

Moreover, the neutralizing by the modeling module 155 is also implicitlyperformed by determining the 3D model of the face in accordance withaspects of the present disclosure. That is, the 3D model associatesevery face patch with a generative probability that measures itscloseness to corresponding patches from the neutral face images that the3D model is based on. Hence, the 3D model down-weighs facial patchesthat are affected by facial expressions.

Also, in accordance with aspects of the present disclosure, the modelingmodule 155 determines aging of the individual represented by the 3Dmodel. Aging effects can be characterized as a combination of shape(e.g., cranial growth, sagging features) and textural variations (e.g.,skin wrinkles). In embodiments, the modeling module 155 extrapolates 3Dshape and texture model to account for aging. For example, the modelingmodule 155 can determine PEP models for different age groups (e.g.,teenage (<20), young adult (20 to 35 yrs), middle-aged adult (35 to 50yrs), and senior adult (50 and above)). The age-group based PEP modelsprovide a unified framework to characterize patch-based appearancevariations across age groups. In embodiments, the modeling module 155limits the learning of age-group based PEP models to frontal pose bin,using frontal face images of subjects belonging to that age-group due tolack of sufficient face aging datasets across pose.

Notably, in accordance with aspects of the present disclosure, the 3Dmodel determined by the modeling module 155 accounts for facialdecorations and occlusions. The facial decoration and occlusion areimplicitly removed under the attribute-based face representation. Thatis, the 3D model is built using faces with no facial decorations andocclusion. The patches selected based on high probabilities of thecomponents in the model are therefore those without the facial hair andwith appearance similar to the appearance of the training examplepatches. For example, in determining the 3D model, the modeling module155 uses skin texture modeling to selectively extract 2D skin patchesfrom an image and update the holistic skin texture of a 3D mesh. Thus,the skin of the 3D model lacks facial hair. Instead, the attributes forthe individual determined by the analysis module 152 characterizes thepresence of the facial hair, which can be used to characterize the 3Dmodel.

In accordance with aspects of the present disclosure, the extractionmodule 159 extracts 2D patches from the 3D model that correspond todifferent ranges of poses. In embodiments, the extraction module 159densely samples 2D patches from images rendered for each of a number ofpose-bins. The 2D patches can have varying sizes (e.g., resolutions).For example, the extraction module 159 can extract 2D patches at anumber of sizes levels (e.g., 10), wherein each size level isprogressively smaller (e.g., 80%) at each level. Further, for eachlevel, the extraction module 159 resolution, the extract 2D patcheswould be sample the face image in a step-wise fashion (e.g., each stepis one-half of the 2D patch width). Depending on how the pose-bins arepopulated (e.g., using patches from observed image, patches extrapolatedusing regression or patches rendered from the normalized 3D model),different uncertainty metrics are associated to them based on thequantity and/or quantity of the respective data used to determine the 2Dpatches.

In accordance with aspects of the present disclosure, the matchingmodule 163 determines matches between an input image (e.g., an imagecaptured of a wrongdoer at an event) and of the 2D patches extracted bythe extraction module 159. Similarity between the input image and thegallery media is computed as matching scores between the heterogeneoussignatures of their representations. In embodiments, the matching module163 uses a combination of indexing and matching scheme to matchmulti-view PEP signatures and account for the uncertainties of each ofthe components. In accordance with aspects of the present disclosure,visual attributes to describe a face provide an intermediaterepresentation space for assessing similarity between faces. Whereaslow-level features are strongly affected by perturbations due tophotometric and geometric changes in the scene, the space of describablefacial attributes provide a more generalizable metric for establishingcorrespondences between faces.

The flow diagrams in FIGS. 3-6 illustrate functionality and operation ofpossible implementations of systems, devices, methods, and computerprogram products according to various embodiments of the presentdisclosure. Each block in the flow diagrams of FIGS. 3-6 can represent amodule, segment, or portion of program instructions, which includes oneor more computer executable instructions for implementing theillustrated functions and operations. In some alternativeimplementations, the functions and/or operations illustrated in aparticular block of the flow diagrams can occur out of the order shownin FIGS. 3-6. For example, two blocks shown in succession can beexecuted substantially concurrently, or the blocks can sometimes beexecuted in the reverse order, depending upon the functionalityinvolved. It will also be noted that each block of the flow diagrams andcombinations of blocks in the block can be implemented by specialpurpose hardware-based systems that perform the specified functions oracts, or combinations of special purpose hardware and computerinstructions.

FIG. 3 illustrates a flow diagram of an exemplary process 300 foringesting, modeling, extracting and matching images in accordance withaspects of the present disclosure. The steps of FIG. 3 can beimplemented using the facial recognition system (e.g., facialrecognition system 105) of FIG. 1 to obtain images from, for example, animage source (e.g., image source 110), and to process the obtainedimages to perform facial recognition.

At step 303, the facial recognition system (e.g., via ingestion module151) obtains one or more images of an individual. For example, thefacial recognition system can obtain an number of different imagesincluding an images of an individual's face from image sources (e.g.,image source 110), such as a camera and/or an image database. The imagescan be stored in a database (e.g., image database 136) for reference andprocessing by the facial recognition system.

At step 305, the facial recognition system (e.g., via analysis module153) determines attributes from the images. In accordance with aspectsof the invention, the attributes semantically describe characteristicsof the subject. In embodiments, the attributes are determined based onpredefined information and models (e.g., domain knowledge database 137).

At step 307, the facial recognition system (e.g. via modeling module155) determines a 3D model of the individual's face using the images.For example, the modeling module 155 may select a 3D mesh from a library(e.g., domain knowledge database 137) based on the attributes determinedat step 305 and populate the mesh with patches of the images obtained instep 303. In embodiments, the facial recognition system can identifyelements of the 3D model lacking information from the plurality ofimages. If the 3D model lacks any of the elements, the facialrecognition tool can provide the information for the identified elementsusing domain knowledge (e.g., domain knowledge database 137) compiledfrom individuals having attributes that are similar to the attributes ofthe subject or target individual.

At step 309, the facial recognition system (e.g., via modeling module155) normalizes the 3D model determined at step 307. Normalizing caninclude relighting the 3D model to normalize lighting variations in theface represented by the 3D model. Additionally, the normalizing caninclude neutralizing an expression of the face represented by the 3Dmodel, modifying an ages of the face represented by the 3D model, andaccounting for facial decorations and occlusions associated with the 3Dmodel, as previously described herein.

At step 311, the facial recognition system (e.g., via extraction module159) extracts 2D patches from the 3D model normalized in step 309corresponding to different poses of the face. For example, each of thedifferent poses can correspond to respective viewing angles ranges ofthe 3D model. For each viewing angle range, the facial recognitionsystem can determines a number of visible patches and store informationof the patches in a database (e.g., in model database 138) inassociation with the respective viewing angle ranges.

At step 313, the facial recognition system (e.g., via extraction module159) determines multi-view PEP signatures for the different poses usedin step 311. In embodiments, the multi-view PEP signatures correspond torespective viewing angle ranges of the 3D model from different angles.In embodiments, the facial recognition system iteratively refines themulti-view PEP signatures using a number of additional face images.However, in accordance with aspects of the invention, each of themulti-view PEP signatures has a fixed size irrespective of the numberadditional face images. Also, in accordance with aspects of theinvention, the facial recognition system determines one of themulti-view PEP signatures corresponding to a portion of the face havinga greatest discriminative feature with respect to other features of theface. In embodiments, the determination of the portion of the facehaving a greatest discriminative feature is made using a convolutionalneural network that trained with data to perform facial featureselection. For example, based on training data, the convolutional neuralnetwork can be used to determine an uncertainty metric for each portionand select a corresponding portion of the faces having the smallestuncertainty metric.

At step 315, the facial recognition system (e.g., via extraction module159) indexes the multi-view PEP signatures with the attribute determinedin step 305. In embodiments, for a particular multi-view PEP signature,the attribute can be indexed by converting it to a vector that istreated as a component multi-view PEP signature. For example, theindexing can be performed using an Optimized Transform Coding method.

At step 317, the facial recognition system (e.g., via extraction module159) associates one or more uncertainty metrics with each of themulti-view PEP signatures. The uncertainty metrics can be valuesdetermined based on the quality of the information used to generate eachof the multi-view PEP signatures (e.g., due to occlusions, facialdecorations, lightings, and viewing angle). The multi-view PEPsignatures can be stored in a database (e.g., model database 138) inassociation with their respective attributes determined at step 315 andthe respective uncertainty metrics determined at step 317.

At step 319, the facial recognition system (e.g., via matching module163) determines whether an input image matches a face of an individualthat has been modelled based on the attributes determined in step 305,the multi-view PEP signatures determined in step 315, and theuncertainty metrics determined at step 317. In embodiments, thedetermination includes modifying a resolution of the plurality ofsignatures based on a resolution of the input image. Additionally, inembodiments, the determination includes performing the matching using aplurality of imaging modalities. For example, the matching may beperformed using PEP signatures corresponding to visible spectrum images,infrared images, and/or drawings.

FIG. 4 illustrates a flow diagram for determining an attribute-basedrepresentation using a facial recognition system (e.g., facialrecognition system 105) in accordance with aspects of the presentdisclosure. The facial recognition system can be the same as thatpreviously described herein. At step 403, the facial recognition systemreceives one or more images 405 of an individual from one or moresources (e.g., image source 110). At step 407, the facial recognitionsystem (e.g., using modeling module 155) determines a 3D model of theindividual's face. The 3D model can be based on a standard shape that isselected based on attributes of the individual (e.g., gender, age,ethnicity, etc.) that are extracted from the received images (e.g.,using analysis module 153). Further, the facial recognition system canmodify the representation of the individual's face in the 3D module byrelighting the model, normalizing a facial expression, and/or aging theface, as previously described herein.

At step 409, the facial recognition system 105 (e.g., using extractionmodule 159) determines multi-view PEP signatures from the 3D modeldetermined at step 407 by extracting 2D patches corresponding to anumber of different poses of the 3D model. Each of the poses cancorrespond to a viewing angle of the 3D model based on a differentcombination of pitch and yaw ranges. For example, a first combinationcan include a pitch range of −15 degrees to 15 degrees and a yaw rangeof 10 degrees to 40 degrees; a second combination can include a pitchrange of −10 degrees to +10 degrees and a yaw range of −90 degrees to−75 degrees; a third combination can include a pitch range of −10degrees to +10 degrees and a yaw range of −45 degrees to −15 degrees; afourth combination can include a pitch range of −10 degrees to +10degrees and a yaw range of −15 degrees to +15 degrees; a fifthcombination can include a pitch range of −10 degrees to +10 degrees anda yaw range of +15 degrees to +45 degrees; a sixth combination caninclude a pitch range of −10 degrees to +10 degrees and a yaw range of+75 degrees to +90 degrees; and a seventh combination can include apitch range of −40 degrees to −10 degrees and a yaw range of −15 degreesto +15 degrees. In accordance with aspects of the present disclosure,the multi-view PEP signatures are determined for a plurality of imagemodalities 413 (e.g., visible spectrum, infrared, and sketch/cartoon)

At step 415, the facial recognition system (e.g., using extraction model159) populates a plurality of bins 417 (e.g., bins 0-8) corresponding,respectively, to each of the multi-view PEP signatures determined forthe different poses (e.g., poses 0-8) used in step 409. Additionally,each of the bins 417 is indexed by one or more attributes 419. Further,each of the bins 417 is associated with a respective uncertainty metric421. In accordance with aspects of the present disclosure, an individualcan be identified based the similarity between an image of theindividual and the multi-view PEP signatures determined for thedifferent poses.

FIG. 5 illustrates a flow diagram for determining attributes by a facialrecognition system (e.g. facial recognition system 105) in accordancewith aspects of the present disclosure. The attributes can be determinedby an analysis module 153 of the facial recognition system, which can bethe same as that previously discussed herein. At step 503, the analysismodule 153 can detect an individual's face in an image, as previouslydescribed herein. The detected face can be associated with a pose. Atstep 505, the analysis module 153 can determine fiducial points in theface detected at step 503, as previously described herein. At step 507,the analysis module 153 can determine 2D patches from within the facebased on the fiducial points determined at step 505.

Further, at step 509, the analysis module 153 can classify attributes ofthe face detected in step 503 (e.g., pose) and in the 2D patchesdetermined at step 507. For example, based on the face and the 2Dpatches, the analysis module 153 uses a linear classifier thatassociates the semantics, “male,” “Caucasian,” “pointy nose,” and“glasses” with the image. Each of the semantics may have an associatedweight of the respective semantic corresponding to a certainty of thedetermination. For example, a weight associated with the semantic “male”is greater when the analysis module 153 determines that the gender ofthe individual in the image is certainly a male, and the weight can belower when the analysis module 153 determines that the gender of theindividual in the image is not clearly a male. In embodiments, thecertainty can be determined based on a similarity determined bycomparison of fiducial points in the images and reference data (e.g., indomain knowledge database 137).

In embodiments, the analysis module 153 determines the attributes usinga convolutional neural net (CNN) that identifies pose-specific PEPrepresentation corresponding to a range yaw and pitch values of a headpose. By decomposing the image into the 2D patches in step 507 based onparts that are pose-specific, the subsequent training of convolutionalneural net is substantially easier. Accordingly, the analysis module 153can determine pose-normalized features from relatively small datasets.In addition to low-level features, the image patches used forestablishing correspondence (or matching) between a pair ofrepresentations depend on the 3D pose (yaw and pitch), and can belearned independently for each 3D pose using the convolutional neuralnet. Further, the analysis module 153 may use a model that augments deepconvolutional networks to have input layers based on semanticallyaligned part patches. This model learns features that are specific to acertain attribute under a certain pose. The analysis module 153 can thencombine the attributes produced by such networks and construct apose-normalized deep representation. The analysis module integrates deeplearning architecture in the multi-view PEP based representation whichis trained to support media with varied resolution, quality andconditions (e.g., age, pose, illumination).

FIG. 6 illustrates a flow diagram for a process performed by a facialrecognition system (e.g. facial recognition system 105) for determiningmulti-view PEP signature in accordance with aspects of the presentdisclosure. The attributes can be determined by extraction module 159,which can be the same as that those previously discussed herein.

At step 603, the extraction module 159 extracting local descriptors froma 3D model, which may be the same as previously described. At step 605,the extraction module 159 determines components of a PEP model. Inaccordance with aspects of the invention, the training images (e.g. inimage database 136), the modeling module 155 extracts spatial-appearancelocal descriptors using a Gaussian mixture model constraining theGaussian components to be spherical. The extraction module 159 candetermine parameters using Expectation-Maximization (EM). The PEP modeleffectively handles pose variations based on a part basedrepresentation, and handles variations from other factors usinginvariant local descriptors.

At step 607, the extraction module 159 determines a maximum likelihoodpart descriptors from among the components of the PEP model determinedin step 605. For example, each Gaussian component (representing a facepart) of the determined PEP model selects the local image descriptorwith highest likelihood for that component arose from the parameters ofthe model.

At step 609, the extraction module 159 determines a PEP signature fromthe maximum likelihood part descriptors determined at step 607. Todetermine a final representation, extraction module 159 can concatenatethe selected descriptors from all components. To handle real-worldconditions, extraction module 159 extends the PEP model described aboveinto Pose-aware PEP Model, whereby the modeling module 155 discretizethe yaw-pitch pose space into different pose bins and obtain a differentPEP model and representation for each. The ensemble of all the PEPmodels leads to an ensemble PEP representation that can more effectivelymodel a larger range of pose variations. The extraction module 159metric learning for each individual PEP representation in the ensembleand naturally adopt the generative probability of the input face imageswith respect to each individual PEP model to adaptively weight themetrics defined upon each individual PEP representations.

With every additional face image of a subject, the extraction module 159aggregates the part descriptors adopting soft-max aggregation. Byobtaining a weighted sum of all the maximum likelihood part descriptorsfrom all face images, where the weight of each maximum likelihood partdescriptor is set by a multinomial soft-max function using theprobability of the descriptor associated with the corresponding part,PEP model enables incremental and reversible updates of descriptors.Simultaneously recording the probability of each maximum likelihood partdescriptor, enables flexibly updating an existing representation byeither adding the maximum likelihood descriptor from additional newimages, or removing the maximum likelihood descriptor from a subset ofexisting images which have been used to produce the existingrepresentation, without the need to access all the original images.Further, soft-max aggregation based updates allow the Pose-aware PEPrepresentation to be fixed in size.

In accordance with aspects of the invention, pose-aware PEP based 2Drepresentations will be a three part representation, with each forimagery from visible spectrum, from near infrared spectrum and forcomposite sketches (or cartoons). For each type of representation,extraction module 159 estimates an uncertainty metric, which associatedwith the signature derived from the patch based on generativeprobabilities. Such uncertainty metric can assist in accurately matchingsignatures with individuals.

The present disclosure is not to be limited in terms of the particularembodiments described in this application, which are intended asillustrations of various aspects. Many modifications and variations canbe made without departing from its spirit and scope, as will be apparentto those skilled in the art. Functionally equivalent methods andapparatuses within the scope of the disclosure, in addition to thoseenumerated herein, will be apparent to those skilled in the art from theforegoing descriptions. Such modifications and variations are intendedto fall within the scope of the appended claims. The present disclosureis to be limited only by the terms of the appended claims, along withthe full scope of equivalents to which such claims are entitled. It isalso to be understood that the terminology used herein is for thepurpose of describing particular embodiments only, and is not intendedto be limiting.

With respect to the use of substantially any plural and/or singularterms herein, those having skill in the art can translate from theplural to the singular and/or from the singular to the plural as isappropriate to the context and/or application. The varioussingular/plural permutations may be expressly set forth herein for sakeof clarity.

It will be understood by those within the art that, in general, termsused herein, and especially in the appended claims (e.g., bodies of theappended claims) are generally intended as “open” terms (e.g., the term“including” should be interpreted as “including but not limited to,” theterm “having” should be interpreted as “having at least,” the term“includes” should be interpreted as “includes but is not limited to,”etc.). It will be further understood by those within the art that if aspecific number of an introduced claim recitation is intended, such anintent will be explicitly recited in the claim, and in the absence ofsuch recitation no such intent is present. For example, as an aid tounderstanding, the following appended claims may contain usage of theintroductory phrases “at least one” and “one or more” to introduce claimrecitations. However, the use of such phrases should not be construed toimply that the introduction of a claim recitation by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim recitation to embodiments containing only one suchrecitation, even when the same claim includes the introductory phrases“one or more” or “at least one” and indefinite articles such as “a” or“an” (e.g., “a” and/or “an” should be interpreted to mean “at least one”or “one or more”); the same holds true for the use of definite articlesused to introduce claim recitations. In addition, even if a specificnumber of an introduced claim recitation is explicitly recited, thoseskilled in the art will recognize that such recitation should beinterpreted to mean at least the recited number (e.g., the barerecitation of “two recitations,” without other modifiers, means at leasttwo recitations, or two or more recitations). Furthermore, in thoseinstances where a convention analogous to “at least one of A, B, and C,etc.” is used, in general such a construction is intended in the senseone having skill in the art would understand the convention (e.g., “asystem having at least one of A, B, and C” would include but not belimited to systems that have A alone, B alone, C alone, A and Btogether, A and C together, B and C together, and/or A, B, and Ctogether, etc.). In those instances where a convention analogous to “atleast one of A, B, or C, etc.” is used, in general such a constructionis intended in the sense one having skill in the art would understandthe convention (e.g., “a system having at least one of A, B, or C” wouldinclude but not be limited to systems that have A alone, B alone, Calone, A and B together, A and C together, B and C together, and/or A,B, and C together, etc.). It will be further understood by those withinthe art that virtually any disjunctive word and/or phrase presenting twoor more alternative terms, whether in the description, claims, ordrawings, should be understood to contemplate the possibilities ofincluding one of the terms, either of the terms, or both terms. Forexample, the phrase “A or B” will be understood to include thepossibilities of “A” or “B” or “A and B.” In addition, where features oraspects of the present disclosure are described in terms of Markushgroups, those skilled in the art will recognize that the disclosure isalso thereby described in terms of any individual member or subgroup ofmembers of the Markush group.

While various aspects and embodiments have been disclosed herein, otheraspects and embodiments will be apparent to those skilled in the art.The various aspects and embodiments disclosed herein are for purposes ofillustration and are not intended to be limiting, with the true scopeand spirit being indicated by the following claims.

What is claimed is:
 1. A method comprising: determining athree-dimensional (3D) model of a face of an individual based on aplurality of different images of the individual; extractingtwo-dimensional (2D) patches from the 3D model; and generating aplurality of signatures of the face using different combinations of the2D patches, wherein the plurality of signatures correspond to respectiveviews of the 3D model from different angles.
 2. The method of claim 1,wherein the determining the 3D model comprises: identifying elements ofthe 3D model lacking information from the plurality of images; andproviding the information for the identified elements using domainknowledge compiled from individuals having attributes similar to thoseof the individual.
 3. The method of claim 1, further comprisingmodifying the 3D model by normalizing lighting variations in the 3Dmodel.
 4. The method of claim 1, further comprising neutralizing afacial expression resulting from the plurality of different images ofthe individual.
 5. The method of claim 1, further comprising modifyingthe 3D model based on an age of the individual.
 6. The method of claim1, further comprising determining a plurality of attributes of theindividual that semantically describe characteristics of the individual.7. The method of claim 6, further comprising indexing the plurality ofsignatures based on the plurality of attributes.
 8. The method of claim1, further comprising determining respective uncertainty values for theplurality of signatures, wherein the uncertainty values are based on aquality of respective 2D patches included in the plurality ofsignatures.
 9. The method of claim 1, further comprising determiningthat a face image matches at least one of the plurality of signatures.10. The method of claim 9, wherein the determining that the face imagematches comprises modifying a resolution of the plurality of signaturesbased on a resolution of the face image.
 11. The method of claim 9,wherein the determining that the face image matches comprises matchingusing a plurality imaging modalities.
 12. The method of claim 1, whereinthe plurality of signatures of the face are iteratively refined using anumber of additional face images of the individual.
 13. The method ofclaim 12, wherein the plurality of signatures of the face has a fixedsize irrespective of the number of additional face images.
 14. Themethod of claim 1, further comprising: determining uncertainty metricscorresponding, respectively, to the plurality of signatures. associatingthe plurality of signatures with the corresponding uncertainty metrics.15. The method of claim 1, further comprising determining which of theplurality of signatures corresponds to a portion of the face having agreatest number of discriminative features.
 16. A facial recognitionsystem comprising: a processor; a storage system; program instructionsstored on the computer-readable hardware storage device for execution bythe processor, the program instructions comprising: program instructionsthat determine a three-dimensional (3D) model of a face of an individualbased on a plurality of different images of the individual; programinstructions that extract two-dimensional (2D) patches from the 3Dmodel; and program instructions that generate a plurality of signaturesof the face using different combinations of the 2D patches, wherein theplurality of signatures correspond to respective views of the 3D modelfrom different angles.
 17. The system of claim 16, wherein thedetermining the 3D model comprises: identifying elements of the 3D modellacking information from the plurality of images; and providing theinformation for the identified elements using domain knowledge compiledfrom individuals having attributes similar to those of the individual.18. The system of claim 16, further comprising modifying the 3D model bynormalizing lighting variations in the 3D model.
 19. The system of claim16, further comprising normalizing a facial expression resulting fromthe plurality of different images of the individual.
 20. The system ofclaim 16, further comprising modifying the 3D model based on an age ofthe individual.
 21. The system of claim 16, further comprisingdetermining a plurality of attributes of the individual thatsemantically describe characteristics of the individual.
 22. The systemof claim 21, further comprising indexing the plurality of signaturesbased on the plurality of attributes.
 23. The system of claim 16,further comprising determining respective uncertainty values for theplurality of signatures, wherein the uncertainty values are based on aquality of respective 2D patches included in the plurality ofsignatures.
 24. The system of claim 16, further comprising determiningthat a face image matches at least one of the plurality of signatures.25. The system of claim 24, wherein the determining that the face imagematches comprises modifying a resolution of the plurality of signaturesbased on a resolution of the face image.
 26. The system of claim 24,wherein the determining that the face image matches comprises matchingusing a plurality imaging modalities.
 27. The system of claim 16,wherein the plurality of signatures of the face are iteratively refinedusing a number of additional face images of the individual.
 28. Thesystem of claim 16, wherein the plurality of signatures of the face hasa fixed size irrespective of the number of additional face images. 29.The system of claim 16, further comprising: determining uncertaintymetrics corresponding, respectively, to the plurality of signatures.associating the plurality of signatures with the correspondinguncertainty metrics.
 30. The system of claim 16, further comprisingdetermining which of the plurality of signatures corresponds to aportion of the face having a greatest number of discriminative features.